In God We Trust. All Others Must Bring Data. - W. Edwards Deming. Using Word Embeddings to Recognize Idioms
نویسندگان
چکیده
Expressions, such as add fuel to the fire, can be interpreted literally or idiomatically depending on the context they occur in. Many Natural Language Processing applications could improve their performance if idiom recognition were improved. Our approach is based on the idea that idioms violate cohesive ties in local contexts, while literal expressions do not. We propose two approaches: 1) Compute inner product of context word vectors with the vector representing a target expression. Since literal vectors predict well local contexts, their inner product with contexts should be larger than idiomatic ones, thereby telling apart literals from idioms; and (2) Compute literal and idiomatic scatter (covariance) matrices from local contexts in word vector space. Since the scatter matrices represent context distributions, we can then measure the difference between the distributions using the Frobenius norm. For comparison, we implement Fazly et al. (2009)’s, Sporleder and Li (2009)’s, and Li and Sporleder (2010b)’s methods and apply them to our data. We provide experimental results validating the proposed techniques.
منابع مشابه
Prime Time for PRIMES.
“In God we trust, all others bring data.” This pithy quote from Edwards Deming, 1 of the fathers of quality improvement (QI), speaks to the centrality of data and measurement in QI. Early on hospitalists realized that QI in the hospital setting was key to their value proposition and return on investment strategy. However, pediatric hospitalists have long complained about the dearth of nationall...
متن کاملA classification of hull operators in archimedean lattice-ordered groups with unit
The category, or class of algebras, in the title is denoted by $bf W$. A hull operator (ho) in $bf W$ is a reflection in the category consisting of $bf W$ objects with only essential embeddings as morphisms. The proper class of all of these is $bf hoW$. The bounded monocoreflection in $bf W$ is denoted $B$. We classify the ho's by their interaction with $B$ as follows. A ``word'' is a function ...
متن کاملNot All Neural Embeddings are Born Equal
Neural language models learn word representations that capture rich linguistic and conceptual information. Here we investigate the embeddings learned by neural machine translation models. We show that translation-based embeddings outperform those learned by cutting-edge monolingual models at single-language tasks requiring knowledge of conceptual similarity and/or syntactic role. The findings s...
متن کاملنگران نباشید، خوشحال باشید، هماهنگی تفاوتها
In reference to most people's inability to communicate effectively and correctly which make their life bitter and unpleasant even causes a lapse from duty or renouncing the work. Acknowledgement and sensitivity to different personality, make a big change in way of perceiving, behaving and encountering others. The Hartman's color code is based on the belief that each of us is born with an inna...
متن کامل